Goto

Collaborating Authors

 dopamine system


A Rewarding Line of Work

Communications of the ACM

As an undergraduate at Stanford University in the mid-1970s, Richard Sutton pored through the school's library, trying to read everything he could about learning and machine intelligence. What he found disappointed him, because he did not think it really got to the heart of the matter. "It was mostly pattern recognition. It was mostly learning from examples. And I knew from psychology that animals do very different things," Sutton said.


Timing and Partial Observability in the Dopamine System

Neural Information Processing Systems

We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models pre- dicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the under- lying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.